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CANDECOMP/PARAFAC Decomposition of 
High-order Tensors Through Tensor Reshaping 

Anh Huy Phan*, Petr Tichavsky and Andrzej Cichocki 

Abstract 

In general, algorithms for order-3 CANDECOMP/PARAFAC (CP), also coined canonical polyadic 
decomposition (CPD), are easily to implement and can be extended to higher order CPD. Unfortunately, 
the algorithms become computationally demanding, and they are often not applicable to higher order 
and relatively large scale tensors. In this paper, by exploiting the uniqueness of CPD and the relation of 
a tensor in Kruskal form and its unfolded tensor, we propose a fast approach to deal with this problem. 
Instead of directly factorizing the high order data tensor, the method decomposes an unfolded tensor with 
lower order, e.g., order-3 tensor. On basis of the order-3 estimated tensor, a structured Kruskal tensor 
of the same dimension as the data tensor is then generated, and decomposed to find the final solution 
using fast algorithms for the sti'uctured CPD. In addition, strategies to unfold tensors ai^e suggested and 
practically verified in the paper. 

Index Terms 

Tensor factorization, canonical decomposition, PARAFAC, ALS, structured CPD, tensor unfolding, 
Cramer-Rao induced bound (CRIB), Cramer-Rao lower bound (CRLB) 

I. Introduction 

CANDECOMP/PARAFAC ID, 13, also known as Canonical polyadic decomposition (CPD), is a 
common tensor factorization which has found applications such as in chemometrics Q-lHl, telecom- 
munication HI, Q, analysis of fMRI data HI, time- varying EEC spectrum [Ql, ifTOl . data mining lillil . 
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|[T2l . separated representations for generic functions involved in quantum mechanics or kinetic theory 
descriptions of materials |[T3l . classification, clustering |[T4|. compression ifTSl - lfTTl . Although the original 
decomposition and applications were developed for three-way data, the model was later widely extended to 
higher order tensors. For example, P. G. Constantine et al. fTSl] modeled the pressure measurements along 
the combustion chamber as order-6 tensors corresponding to the flight conditions - Mach number, altitude 
and angle of attack, and the wall temperatures in the combustor and the turbulence mode. Hackbusch, 
Khoromskij, and Tyrtyshnikov ||f9l and Hackbusch and Khoromskij EOl investigated CP approximation 
to operators and functions in high dimensions. Oseledets and Tyrtyshnikov fTV\ approximated the Laplace 
operator and the general second-order operator which appears in the Black-Scholes equation for multi- 
asset modeling to tackle the dimensions up to = 200. In neuroscience, M. M0rup et al. [9] analyzed 
order-4 data constructed from EEG signals in the time-frequency domain. Order-5 tensors consisting of 
dictionaries X timeframes x frequency bins x channels x trials-subjects |E2l built up from EEG signals 
were shown to give high performance in BCI based on EEG motor imagery. In object recognition (digits, 
faces, natural images), CPD was used to extract features from order-5 Gabor tensors including hight x 
width x orientation x scale x images [221. 

In general, many CP algorithms for order-3 tensor can be straightforwardly extended to decompose 
higher order tensors. For example, there are numerous algorithms for CPD including the alternating least 
squares (ALS) algorithm HI, IS with Une search extrapolation methods HI, 121, ||23l-||25l, rotation CSl 
and compression ETll . or all-at-once algorithms such as the OPT algorithm |[28l . the conjugate gradient 
algorithm for nonnegative CP, the PMF3, damped Gauss-Newton (dGN) algorithms ||5l, |[29l and fast dGN 
Il30l - ll32l . or algorithms based on joint diagonalization problem Il33l- ll35l . The fact is that the algorithms 
become more complicated, computationally demanding, and often not applicable to relatively large scale 
tensors. For example, complexity of gradients of the cost function with respect to factors grows linearly 



for a tensor of size 



n=l 



with the number of dimensions A^. It has a computational cost of order O 
h X I2 X ■ ■ ■ X In. More tensor unfoldings Y(„) (n = 2,3, . . . ,N - I) means more time consuming due to 
accessing non-contiguous blocks of data entries and shuffling their orders in a computer. In addition, line 
search extrapolation methods lU, lH, fSl, 1231 . 1241 . |[36l become more complicated, and demand high 
computational cost to build up and solve {2N - 1)- order polynomials. The rotation method 129 needs 
to estimate rotation matrices of size RxR with a whole complexity per iteration of order 0{N^R^). 

Recently, a Cramer-Rao Induced Bounds (CRIB) on attainable squared angular error of factors in 
the CP decomposition has been proposed in |[37l . The bound is valid under the assumption that the 
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decomposed tensor is corrupted by additive Gaussian noise which is independently added to each tensor 
element. In this paper we use the results of ll37l to design the tensor unfolding strategy which ensures 
as little deterioration of accuracy as possible. This strategy is then verified in the simulations. 

By exploiting the uniqueness of CPD under mild conditions and the relation of a tensor in the Kruskal 
form [38] and its unfolded tensor, we propose a fast approach for high order and relatively large-scale 
CPD. Instead of directly factorizing the high order data tensor, the approach decomposes an unfolded 
tensor in lower order, e.g., order-3 tensor. A structured Kruskal tensor of the same dimension of the data 
tensor is then generated, and decomposed to find the desired factor matrices. We also proposed the fast 
ALS algorithm to factorize the structured Kruskal tensor. 

The paper is organized as follows. Notation and the CANDECOMP/PARAFAC are briefly reviewed 
in Section [ill The simplified version of the proposed algorithm is presented in Section Jn] Loss of 
accuracy is investigated in Section IIII-AI and an efficient strategy for tensor unfolding is summarized 
in Section IIII-B I For difficult scenario decomposition, we proposed a new algorithm in Section [TVl 
Simulations are performed on random tensors and real-world dataset in Section |Vl Section IVTl concludes 
the paper. 

n. CANDECOMP/PARAFAC (CP) decomposition 
Throughout the paper, we shall denote tensors by bold calligraphic letters, e.g., A e ^'i>^hx-xiN ^ 
matrices by bold capital letters, e.g., A =[«!, a2> ■ ■ ■ > £ K.^^^, and vectors by bold italic letters, e.g., 
Uj or I = [/i,/2, - - - ,/yv]. A vector of integer numbers is denoted by colon notation such as A; = i:j = 
[i, i + I, . . . , j - 1,7']. For example, we denote l:n = [1,2, . . . , n]. The Kronecker product, the Khatri-Rao 
(column-wise Kronecker) product, and the (element-wise) Hadamard product are denoted respectively by 
®,0,® El, 1391. 

Definition 2.1: (Kruskal form (tensor) IH, iBSl) A tensor X e ]R^ix/2x -x/a, Kruskal form if 

X = Y,A,a[''oaf'o...oaf\ (1) 

= P;A('>,A(2),...,A(^>1, ^=[^1,^2,...,^^]. (2) 

where symbol "o" denotes the outer product, A^"^ = [a^"\ . . . , aj^"^] e MJ"""^, {n ^ 1,2, ...,N) are 
factor matrices, al^^^a^"^ = 1, for all r and n, and Ai > A2 > ■ ■ ■ > Ar > 0. 

Definition 2.2: (CANDECOMP/PARAFAC (CP) ffl, (1, 01, EU ) Approximation of order-A^ data 
tensor y e r^ix/ix -x/a, ^ j-ank-T? tensor in the Kruskal form means 

y - y + £, (3) 



November 19, 2012 



DRAFT 



4 



TABLE I 

Complexities per iteration of major computations in CPD ALGORITHMS. J — 0/j=l Ad T — [ In- 



Computing Process Complexity 

Gradient [D, G) O(NRJ) 

Fast gradient ED 0(RJ) 

(Approximate) Hessian and its inverse (U, (291 0[r^T^) 
Fast (approximate) Hessian and its inverse 1311 . 1371 0{r^T + N^R^^ 

Exact line search (2, H, (5] 0{2^Rj) 

Rotation |26l 0[n^R'^) 



where ^ = A<^\ A*^), . . . , A^^^I, so that ||y - y\\l is minimized. 

There are numerous algorithms for CPD including alternating least squares (ALS) or all-at-once 
optimization algorithms, or based on joint diagonaUzation. In general, most CP algorithms which factorize 
order-A^ tensor often face high computational cost due to computing gradients and (approximate) Hessian, 
line search and rotation. Table U summarizes complexities of major computations in popular CPD algo- 
rithms. Complexity per iteration of a CP algorithm can be roughly computed based on Table U For exam- 
ple, the ALS algorithm with line search has a complexity of order 0{NRJ+2^RJ+NR^) = Oi2^RJ+NR^). 

III. CPD OF UNFOLDED TENSORS 

In order to deal with existing problems for high order and relatively large scale CPD, the following 
process is proposed: 

1) Reduce the number of dimensions of the tensor ^ to a lower order (e.g., order-3) through tensor 
unfolding which is defined later in this section. 

2) Approximate the unfolded tensor by an order-3 tensor in the Kruskal form. Dimensions 
of ^[Z] which are relatively larger than rank R can be reduced to R by the Tucker compression 
B3l - P6]| prior to CPD although it is not a lossless compression. In such case, we only need to 
decompose an Rx Rx R dimensional tensor. 

3) Estimate the desired components of the original tensor y on basis of the tensor ^j/j in the Kruskal 
form. 

The method is based on an observation that unfolding of a Kruskal tensor also yields a Kruskal 
tensor. Moreover due to uniqueness of CPD under "mild" conditions, the estimated components along 
the unfolded modes are often good approximates to components for the full tensor. In the sequel, we 
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introduce basic concepts that will be used in the rest of this paper. Loss of accuracy in decomposition 
of the unfolded tensors is analyzed theoretically based on the CRIB. 

Definition 3.1 (Reshaping): The reshape operator for a tensor y e ^hxhx-xiN iq ^ gj^e specified 
by a vector L = [Li,L2, . . . ,Lm] with Y\m=i^"i = 11^=1 4 returns an order-M tensor OC, such that 
vecCy) = vec(X), and is expressed as X = reshapeCy, L) e r^ix^2x -xl„ 

Definition 3.2 (Tensor transposition [47]): If yi e ^hx-^'N and is a permutation of [\,2, . . . ,N], 
then A^''^ e R^''i ^'"^'pn denotes the p-transpose of A and is defined by 

A^P^{ip^,...,ip,) = A(ii,...,iN), l<i<I = [h,h,...JN]. (4) 

Definition 3.3 (Generalized tensor unfolding): Reshaping a p-transpose y^^^ to an order-M ten- 
sor of size L ^ [LuL2,. . . ,Lm] with L,„ = where [l^h, ■ ■ ■ Jm] = [pi, Pi, ■ ■ ■ , Pn], hn = 

[/^(l), . . . , /m(^m)] 

y„ = reshape(y<^>, L), l = [h,h,..., Im\. (5) 

Remark 3.1: 

1) If Z = [n,{l:n - l,n + l:N)], then = Y(„) is mode-n unfolding. 

2) If y is an order-4 tensor, then ^[1,2,(3,4)1 order-3 tensor of size Ii x I2X 73/4. 

3) If y is an order-6 tensor, then y[(i,4),(2,5),(3,6)i is an order-3 tensor of dimension /1/4 x 72/5 x 73/6. 

N 

We denote Khatri-Rao product of a set of matrices U^"', « = 1, 2, . . . , A^, as O U^"^ = U^^' OU^^"" 

Lemma 3.1: Unfolding of a rank-/? tensor in the Kruskal form y = A^'\ A'^-^ . . . , A^'^^J returns an 
order-M rank-/? Kruskal tensor I = [hJi, ■ ■ ^m], given by 

y„ - I't;B<'',B(2),...,B(^)]|, (6) 

where B^™' = (Dk€i,„ A'^*'' e RCHfe/,,, (m = 1, 2, . . . , M) are merging factor matrices. 
Remark 3.2: 

1) If Z - [n,(\:n -\,n + \:N)], then y„ = y(„) - A^"' diag(i) (0^^„ A^)''. 

2) If Z - [(hn), (n + l:N)l then y„ = (O^i A®) diag(i) {Ot,^, A«)^ 

3) For an order-4 Kruskal tensor y, 1,2,(3,4)1 = U; A^^K A^^'^ Q A^^^. 

Corollary 3.1: An order-A' tensor !B„,^ of size //__ , x Ii^^ x ■ ■ ■ x l^^^^^, Z„, = [/„,i . . . 1,„k\ folded from the 
r-th column vector A^'"' of B^'"', i.e., vec(!B„,J = is a rank-1 tensor 

= a<'""> o a<''"^) o . . . o flC""^'. (7) 
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Algorithm 1: rank-one FCP 



Input: Data tensor y : (/i x /2 x • • • x Iff), rank R, 

Unfolding rule I = [li,l2, ■ ■ ■ , Im] where l,„ = [/„(!), . . . , lm{Km)] 
Output: AeR^, N matrices A*"' e R^"^^ 
begin 

% Stage 1 : Tensor unfolding and optional compression 

|[S,U^'', ■ ■ . ,U**^']| = TDCy|[;]|,min(I,/?)) % Tucker decomposition of order-M 



% Stage 2 : CPD of the unfolded (and reduced) tensor 

|[i;B(i),...,B(^'| = CPD(g,7?) 

for m = 1, 2, . . . , M do B^"' ^ U^™' B^ 

% Stage 3 : Rank-one approximation to merging components 
for m = 1,2, ...,M do 
for r = 1,2, do 



% order-M CPD of the core tensor 
% Back projection of TD 



A,- <^ Ar I 



. , a<''->l - TD(reshape(*("'\ Uiji^, I,jkJ), 1) 



TD{y,R): rank-/? Tucker decomposition of order-A^ tensor y where R = [/?i,7?2, ■ ■ ■ ,Rn]- 

y = CPD (y, R, yinit)'- approximates an order-A^ tensor or a tensor in the Kruskal form ^ by a rank-7? 

Kruskal tensor ^ using initial values yinit- 



In practice for real data, folded tensors are not exact rank-1 tensors but can be approximated by 
rank-1 tensors composed from components corresponding modes in /„,. In other words, computing the 
leading-left singular vector of the mode-/: unfolding [!B,„J(j.j is the simplest approach to recover flr"'*' 
from for = 1,2, . . . , A'. Pseudo-code of this simple algorithm for unFolding CPD (FCP) is described 
in Algorithm [T] The more complex and efficient algorithm is discussed later. 

A. Selecting an unfolding strategy 

For (noiseless) tensors which have exact rank-7? CP decompositions without (nearly) collinear compo- 
nents, factors computed from unfolded tensors can be close to the true solutions. However, for real data 
tensor, there exists loss of accuracy when using the rank-one approximation approach. The loss can be 
affected by the unfolding, or by the rank-7? of the decomposition, especially when 7? is under the true rank 
of the data tensor. This section analyzes such loss based on comparing CRIBs on the first component 
a^^^ of CPDs of the full tensor and its unfolded version. We use Ui a shorthand notation for a'^\ The 
results of this section give us an insight into how to unfold a tensor without or possibly minimal loss of 
accuracy. 

The accuracy loss in decomposition of unfolded tensor is defined as the loss of CRIB |[37]| . P8]| . P9]| on 
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components of the unfolded tensor through the unfolding rule / compared with CRIB on components of 

the original tensor. For simplicity, we consider tensors in the Kruskal form (12.11) which have a|."'^ o'f' = Cn, 

for all n, r i= s, and = l,-l<c„<l. Coefficients c„ are called degree of coUinearity. 

1) Loss in unfolding order-4 tensors: For order-4 tensors, since CRIB(ai) is largely independent of 

c\ unless c\ is close to +1 |f37l|, we consider the case when ci - 0. Put h - C2C3C4 and 9 = From 

''1 

II37I and Appendix IbI we get CRIBs for rank-2 CPD in explicit forms: 

3/i2 



CRIB(ai) = 

CRIB|[i_2,(3,4)](«l) = 
CRIB|[l,(2,3),4)]|(«l) = 



1 -/j2 

9 
9 

1 -/j2 



1 + 



7,-3 + 



2 2 2 2 2 2 
^2^3 ^2^4 ^3^4 



1 "H 2/z ^2^3 ^2^4 ^3^4 



1 



1 



3 + 



1 - c^c^ 
1 C2C3 



2 2 



1 

^ ^-4 



In general, CRIB(ai) < CRIB|[i_2,(3,4)](ai). The equality is achieved for C2 - 0. 



CRIB(ai)c2=o - CRIB|[i^2,(3,4)](«i), 



C2=0 



2 + 



1 



(8) 
(9) 
(10) 

(11) 



It means that if modes 1 and 2 comprise (nearly) orthogonal components, the tensor unfolding [1,2, (3,4)] 
does not affect the accuracy of the decomposition. 

From ^ and (fTOl l. it is obvious that CRIB|[i_2,(3,4)i(«i) ^ CRIBji (2,3),4)]|(ai) if c\ < c\. This indicates 
that collinearities of modes to be unfolded should be higher than those of other modes in order to 
reduce the loss of accuracy in estimating ay. Note that the new factor matrices yielded through tensor 
unfolding have lower colUnearity than those of original ones. Moreover, tensors with high collinear 
components are always more difficult to decompose than ones with lower coUinearity ll29l . ll50l . lISTTl . 
Hence, it is natural to unfold modes with highest coUinearity so ffiat the CPD becomes easier. This rule 
also holds for higher rank R, and is illustrated in a particular case when ci = C3 = 

(12) 
(13) 



CRIBii,2,(3,4)](«i) = 9\h-R + 



CRIB 



Il,(2,3),4] 



(«l) - 



R + 



R- 


-n 


1 - 


r2 


R- 


■l\ 


1 - 


c2 



The unfolding [1,2,(3,4)] is more efficient than the unfolding [1,(2, 3), 4] when \c2\ < \ca\, although this 
unfolding still causes some loss of CRIB despite of C4 since 



CRIB(ai) - 9 



h-R + 



R-\ 

1 ~ ^2^\j 



< 9\h -R + 



R-\ 



1 



= CRIB|[i,2,(3,4)](ai), for aU C2. 



Moreover the loss is significant when C4 is small enough. Note that for this case, the unfolding [1,3, (2,4)] 
is suggested because it does not cause any loss according to the previous rule. 
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In other words, modes which comprise orthogonal or low collinear components (i.e., c„ w 0) should 
not fold with the other modes, unless the other modes have nearly orthogonal columns as well. 

Example 1 We illustrate the similar behavior of CRIB over unfolding but for higher-order ranks. We 
decomposed ^ [1,2,(3,4)] unfolded from rank-7? tensors of size RxRxRxR with 7? = 3, 5, ... , 30, corrupted 
with additive Gaussian noise of 10 dB SNR. There was not any significant loss in factors when modes 1 
and 2 comprised low-coUinear components despite of coUinearity in modes 3 and 4 as seen in Figs. |l(a)| - 
|l(c)| For all the other cases of (ci,C2, 03,04), there were always significant losses, especially when all 
the factors comprised highly collinear components (i.e., c„ is close to ±1) as seen in Figs. |l(d)P(f) 
When the first mode has nearly collinear factors, i.e., ci is close to ±1, we have ||37l 

9{h - 1) 



CRIB(fli),,=ii 



1 -/l2 



< CRIB (a 1) 



ci=0' 



(14) 



but the expressions for the folded tensor decomposition remain unchanged. It means that the loss occurs 
as seen in Fig. |l(d)| and Fig. 1(f) 

2) Loss in unfolding order-5 tensors: For order-5 rank-2 tensors, we consider the case when c\ - 0, 



and put h - C2Ct,ca,cs. CRIBs of decompositions of the full and unfolded tensors are given by 



CRIB (a 1) = 

CRIB|[l,2,(3,4,5)l(«l) = 



9 



CRIB 



Il,(2,3),(4,5)l 



(«l) 



1 -/l2 



1 -/l2 



1 -/l2 



h-l + 

/i - 3 + 
/i - 3 + 



^-4/i2 
1 + 3/i2 - ^ 
1 



\-cl 



1 _ Ac-^c'^ 



1 



1 



„2 2 
'2^3 



1 



2 2 



(15) 
(16) 
(17) 



where ^ - c\c\c\ + c^c^c^ + c\c\c^^ + c^c^Cy From ([T6l ) and (fTTl) . it is obvious that CRIB|[i,2,(3,4,5)](ai) < 
CRIB|[i,(2,3),(4,5)j(ai) if < c^Cj. This rule coincides with that for order-4 tensors to reduce the collinearity 
of the merging factor matrices as much as possible. For C2 = 0, the expressions ([TSl l and (fT6l ) become 
identical, but expression ([T7] ) is larger, in general. 

For higher order tensors, we analyze the CRIB loss in decomposition of order-6 tensors with assumption 
that Ci = C2 = ■ ■ ■ = C(, = c through 5 unfoldings /i = [1,2,3,4,(5,6)], I2 = [1,2,3,(4,5,6)], 1^ = 
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40 r 



35 r 



30 



25 



(a) (ci,C2,C3,C4) = (0.1,0.1,0.1,0.1). 




(C) (Cl,C2,C3,C4) = (0.1,0.1,0.9,0.9). 




15 




(e) (ci,C2,C3,C4) = (0.1,0.9,0.9,0.9). 





|Factor-1 

I I Factor-2 
I I Factor-3 
Factor-4 
I ^CRIB 

'OKI 

20 30 



(b) (ci,C2,C3,C4) = (0.1,0.1,0.9,0.1). 




(d) (ci,C2,C3,C4) = (0.1,0.9,0.1,0.9). 




|Factor-1 
I I Factor-2 
I I Factor-3 
^1 Factor-4 

□crib 

iciii: 

20 30 



(f) (ci,C2,C3,C4) = (0.9,0.9,0.9,0.9), SNR = 30 dB. 



Fig. 1. Median SAE of all components for factors over 30 Monte Carlo runs vs CRIB in decomposition of order-4 tensors of 
size 1,1 = R = 3, 5, 10, . . . , 30, for all n through the unfolding rule / = [1, 2, (3, 4)]. Correlation coefficients c„ have been chosen 
from the set 6 {0.1,0.9), for all n. The signal to white Gaussian noise power ratio (SNR) was at lOdB or 30dB. 
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[1,2,(3,4,5,6)], h = [1,(2, 3), (4, 5, 6)] and h - [1,2, (3, 4), (5, 6)] 



CRIB (a i) 
CRIB,,(ai) 
CRIB,,(ai) 
CRIB,3(ai) 
CRIB/,(ai) 
CRIB/,(ai) 



1 - (1 - Cl0)(l - C8)(l + 3c2 + c4)(l + c2 + 6c4 + + C^) 

c\6c^ + llc^ + lc^ + 5c^ + 1) 
(1 - ciO)(l - c8)(l + c2)(l + 2c2 + 6c4 + 2c6 + c^) 
0^(40*^ + 3c4 + 2c2 + 1) 



h 


- 1 


1 - 


-clo 


h 


- 1 


1 - 


-clo 


/l 


- 1 


1 - 


.^10 


h 


- 1 


1 - 




/l 


- 1 


1 - 




h 


- 1 


1 - 


.^10 



+ 



(1 -Cl")(l -C^)(l +3c2 +c4) 

c\2c^ + c'^ + + I) 
(1 -ci")(l -c^) 

c'^(l +c'^)(2c^ + 2c^ + 1) 

(1 - Cl0)(l - c8)(l + c2 + c4)(l - C + c2) 

0*^(20*^ + 2c4 + 2c2 + l)(c'^ + 4c4 + 3c2 + 2) 



(1 - ciO)(l - c^)(l + c2 + c4)(c^ + 3c6 + 6c4 + 3c2 + 1) 
It holds CRIB < CRIB,, < CRIB/, < CRIB,, < CRIB,^ < CRIB,, (see Fig. |2(a)] for 6* - 1 and /i = R). 
Fig. [2(b)] illustrates such behavior of the CRIB losses £ = -lOlogig ( cRIB^l' ) ) '''^^^ higher rank 

R = 20 and the tensor size !„ - R = 20, for all n. The CRIB loss is insignificant when components are 
nearly orthogonal (i.e., c — > 0), and is relevant for highly coUinear components, i.e., c — > 1. Unfolding 
li causes a CRIB loss less than 1 dB, while unfolding I2, h and 1^ can cause a loss of 3, 5 and 7 dB, 
respectively. The result confirms that two-mode unfolding causes a lesser CRIB loss than other rules. The 
unfoldings 1 4 and Is are more efficient than multimode unfoldings I2 and Ij,, respectively in decomposition 
of unfolded tensors of the same orders. 

3) A case when two factor matrices have orthogonal columns: As pointed out in (II II ) that there is 
not any loss of accuracy in estimation of A^'' and A'^^) through unfolding when these two factor matrices 
have mutually orthogonal columns. The result also holds for arbitrary order-A^ rank-7? tensors which have 
orthogonal components on two modes. In such case, the analytical CRIB is given by 
Theorem 3.1 ( 1.37]): When A^'' and A^^) have mutually orthogonal columns, it holds 



CRIB(«0 = '^L-R^f^^X y,.f](a;")V")),r = 2,3,..., 



R. 



(18) 



It is obvious that CRIB|[i_2,(3.a')](<<'i) = CRIB(fli). Hence, estimation of A*'^ and A^^) through unfolding 
is lossless in terms of accuracy. 

An important observation from Theorem 13.11 is that all the factor matrices in CPD with orthogonality 
constraints ||52l . ||53l can be estimated through order-3 tensors unfolded from the original data without 
any loss of accuracy. That is an algorithm for order-3 orthogonally constrained CPD on two or three 
modes can work for arbitrary order-A^ tensors. 
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■■■■ [1:4,(5,6)], 5-D 

[1:3,(4,5,6)1, 4-D 

■- -1,2,(3,4), (5,6), 4-D 

[1,2,(3:6)1, 3-D 

[1,(2,3), (4:6)1, 3-D 




(a) CRIB (dB) for order-6 rank-2 tensors. 



(b) CRIB loss for rank-20 tensors. 



Fig. 2. The CRIB loss in decomposition of order-6 lank-R tensors with size /„ = R and correlation coefficients c„ = c, for all n 
following 5 unfolding rules. The CRIB loss is significant when components are coUinear, i.e., c — > 1. Unfolding / = [1.-4, (5,6)] 
causes a lesser CRIB loss than other rules. The unfolding / = [1, 2, (3, 4), (5, 6)] is more efficient than multimode unfolding rule. 



B. Unfolding strategy 

Based on the above analysis of CRIB loss, we summarize and suggest an efficient unfolding strategy 
to reduce the loss of accuracy. Without loss of generality, assume that < |ci| < \c2\ < • • • < Ica^I < 1, the 
following procedures should be carried out when unfolding an order-A^ tensor to order-M (typically, M 
= 3) 

• Unfold two modes which correspond to the two largest values \c„\, i.e., (A^- 1,A^). This yields a new 
factor matrix with a correlation coefficient cn-i = cn-iCn- The tensor order is reduced by one. 

. Unfold two modes which correspond to the two largest collinearity values among (A^ - 1) values 
[ci,C2,...,cn-2,cn-i\- This can be (A^-3,A^-2) if \cn-i\ < \cnX otherwise, {N -2,N - l,N). The 
new correlation coefficient is cn-^cm-i or cn-2Cn-\Cn- 

• Unfold the tensor until its order is M. 

In addition, (nearly) orthogonal modes should not be merged in the same group. For order-4 tensor, the 
unfolding [1,2,(3,4)] is recommended. 

Example 2 We decomposed order-5 tensors with size /„=/? = 10 and additive Gaussian noise 
of 10 dB SNR. Correlation coefficients of factors matrices were [0.1, 0.7, 0.7, 0.7, 0.8]. Three tensor 
unfoldings h = [(1,4, 5), 2, 3], h = [1,2,(3,4,5)] and h ^ [1, (2, 3), (4, 5)] were applied to the order-5 
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tensors. Unfolding Zi = [(1,4, 5), 2, 3] caused the largest loss of 2 dB with an average MSAE = 36.62 
dB illustrated in Fig. IIII-CI The recommended unfolding h according to the above strategy achieved an 
average MSAE = 38.29 dB compared with the average CRIB - 38.67 dB on all the components. 

Example 3 Unfolding tensors with the same colinearity in all modes. We verified the unfolding rules 
for order-6 tensors with simplified assumption that c\ = C2 = • • ■ = C(, = c. Since correlation coefficients 
are identical, the unfoldings I = [1,2, (3,4), (5, 6)] is one of the best rules. Fig. |2(b)| shows that the CRIB 
loss by / = [1,2,3,(4,5,6)] was higher than the loss by / = [1, 2, (3,4), (5, 6)]. 

C. Unfolding without collinearity information 

For real-world data, although collinearity degrees of factor matrices are unknown, the above strategy 

is still applicable. Since the decomposition through tensor unfolding decomposes only an order-3 tensor, 

the computation is relatively fast. We can try any tensor unfolding, and verify the (average) collinearity 

Y,r ,\ar^ay\ 

degrees of the estimated factors c„ - — — — — , (« = 1, 2, . . . , A^), to proceed a further decomposition 

R{R - 1) 

with a new tensor unfolding. 

Example |2tb) We replicated simulations in Example [2l but assumed that there was not prior collinearity 
information and the bad unfolding rule l\ - [(1,4, 5), 2, 3] was applied. The average coUinear degrees of 
the estimated factor matrices c„ = 0.0989, 0.7007, 0.6992, 0.7021, 0.8014, for « = 1, . . . , A^, respectively, 
indicated that the unfolding [(1,4, 5), 2, 3] is not a good one. The unfolding Ij, = [1, (2, 3), (4, 5)] was 
then suggested, and a further decomposition was proceeded. This improved the MSAE about 2dB. 
For more examples, see decomposition of the ITPC tensor in Example [5] when R = 8. 

IV. Fast Approximation for High Order and Difficult scenario CPD 

An appropriate unfolding rule can reduce the loss of the decomposition. However, the loss always 
occurs when factors have high collinearity or unfolding orthogonal modes. Moreover, in practice, a rank- 
R CPD may not fit the data tensor. This could happen when R is not exactly the true rank of the tensor. 
Especially, for under-rank CPD, the error tensor £ can still be explained by tensors in the Kruskal 
form. In this case, components of the merging factor matrices tend to comprise information of the other 
components in higher rank CP decomposition. Hence, they are no longer rank-one matrices/tensors, and 
approximation of merging components by rank-one tensors cannot yield good approximate to the true 
factors. To this end, low-rank approximations to merging components are proposed, and components are 
estimated through two major stages 
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39 
38 
S37 

LU 
< 

5 36 
35 
34 

[{1,4,5),2,3] [1,2,(3,4,5)] [1 ,(2,3),(4,5)] 

Folding 

Fig. 3. Affect of unfolding rules to the accuracy loss in decomposition of order-5 tensors of size I,, = R = 10 with cj = 0.1, 
C2 = c'3 = C4 = 0.7 and cs = 0.8. Mean SAEs (dB) were computed for all the components over 100 Monte Carlo runs. 

1) Construct an order- structured Kruskal tensor "9/ from the order-M rank-7? Kruskal tensor which 
approximates the unfolded tensor ^iij. often has higher rank than R. 

2) Approximate "^y by a rank-7? Kruskal tensor which is the final result. 

The algorithm is first derived for unfolding two modes, and extended to multimode unfolding. 

A. Unfolding two modes 

We consider a rank-/? CPD of y and a simple unfolding / - [1, . . . , - 2, (A^ - 1, A^)] 

yn = j]A,b\^^ob^^^o...obr'' + e.. (19) 

Assume matrices F,. = reshape(&^'^"'*, Un-i x In]) have rank-/,. { I < J, <^ In-i), i-c, F,. = U,-LrVf , for 
r = 1, 2, ... ,7?, where E,. = diag(crr) and singular values cr^ = [o"^i, cr,.2, . . . , cr^y,], 1 > (Xri > cr^i > • • • > 
o'rj,. > 0, H^^j cr^y = 1. By replacing all b[^~^^ by matrices and for r = 1,2, .. . ,7?, and replicating 
components (« = 1,2, . . . ,A^- 1) 7^ times, we generate an order-A^ rank-7 tensor in the Kruskal form 
where J -Y^r^r- 

R 

Lemma 4.1: The order-A^ rank-7 Kruskal tensor = U; A'-^\ . . . , A'-^-^\ A^^'>J, where J = ^ /,, 



I. 




I 





Factor- 1 




^1 Factor-2 




1 1 Factor-3 




Factor-4 




Factor-5 




□ crib 




nn y m 
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(R < J <^ Rimn{lN-i,lN), = 
Ui U2 



^ll/i /lily, ■■■ ^ftly^ 



l\ and 



; («) - 



Vi V2 ... v« 

M = blkdiag(lixy,,lix7,, 



9/3x7 



/4X7 



e 

■ ^ llx7fi), 



« - l,...,A^-2, 
n = A^- 1, 

V,. = diagCcr^) « - A/', 



has the same approximation error of the best rank-7? CPD of ^ j/j, i.e., Hy - y /||^ = ||£|||. 

If Jr = 1 for all r, y / is the approximation y of the true factors as pointed out in the previous 
section. Otherwise, the order-A^ rank-7 Kruskal tensor is approximated by a rank-/? Kruskal tensor 
y. Note that this procedure does not access and manipulate on the real data y. For example, the mode- 
n CP-gradients of with respect to A® (k 4^ n, n = 1,2, ... ,N) which are the largest workload in 
CP algorithms such as ALS, OPT, dGN can be quickly computed as illustrated in Appendix lAl with a 



low computational complexity of order O 



+ N 



It means that computation of 



JR{In-i + In) + 

(Yy(„) - Y(„)) {Qki^n A*^*^') is much faster than the computation on the raw data (Y(„) - Y(„)^ (0*:^« A*^*^^^. 
In other words, estimation of factors A*^"^ from the Kruskal tensor is relatively fast. 

When the matrices (r = 1,2, ...,R) have not exact low-rank representation, we consider their 

J, 

truncated SVDs such that pr = ^ crj.j > r , «: r < 1 . The parameter r controls the loss of fit by low 
rank approximations. The higher r, the lower loss of fit, but the higher the approximation ranks. In the 
simulations, we set r > 0.98. 

Let denote the solution of the rank-one FCP algorithm (i.e., using Algorithm [T) 



Ar;B^'\ . . . ,B(^-2', [M„,M2i, . . .,Uri] , [vn, V21, . . . , Vki]| , 



where Ar = [Aio-\\,A2cr2i, Arctri] e R^. It is straightforward to see that b^" '' = 11 



(20) 



min(/jv_i,/jv) 



O-rq (Vrq •■ 



u,.q), r=\,2,...,R. Because B^"' = [b^"\ b'j^'], {n = \, . . . ,N -\), forms the best rank-/? CPD of y^q, 
each vector (TrqiVrq^Uyq), (r = 1, . . . ,/?), contributes to achieve the optimal approximation error ||£||f in 
([T9l ). Discarding any set of singular components {Vrq ® Urq) will increase the approximation error. The 
more singular vectors to be eliminated, the higher approximation error of It means that the tensor 
has higher approximation error than the tensor "9 j. That is 



ii£iil<iiy-y7iif <iiy-y«iif, 

or performance of FCP using low-rank approximations is better than that using Algorithm [T] 



(21) 
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B. Unfolding M modes 

Consider an example where unfolding M modes are / = [I, . . . ,N - M,{N - M +1, . . . ,N)]. In this case, 
truncated SVD is no longer applied to approximate tensors 5",. of size In-m+\ x ■ ■ ■ x In and vec(3^r) - 
Ij(n-m)^ However, we can apply low -rank Tucker decomposition to 3^^ 

3^,.«IS.;U^1\UP\...,U<'^'1, (r=l,...,R), (22) 

where U^.'"' are orthonormal matrices of size (In-m+iii-i x T^m), and the core tensor §, = [■^^'^l is of size 

r,i X r,2 X ■ ■ ■ X r,M, t = [tut2,..., imI 

In order to estimate an order- rank-7? Kruskal tensor y. Tucker tensors in (l22l ) are converted to 
an equivalent Kruskal tensor of rank-(rir2 . . . Tm)- However, we select only the most J,- dominant s) 
{I < Jr ^ T1T2 . . . Tm), t eT = {t[, t2,..., tj^} among all coefficients of so that 

Pr = Yj^s\y>T\\%4. (23) 

The tensors 3^r have rank-/^ approximations in the Kruskal form 

P,;U(')Mh,UP)M,2,...,U!.^>M,m1, (r - 1, 2, . . . ,/?), 

where = [s\''\ . . . , ^''l and M™ (m = 1, 2, . . . , M) are indicator matrices of size r„, x which 
have only non-zero entries at M,.,„ (?,;,„, 7) = 1, t j = f,;2> ■ ■ tj^M] ^ T~, j = I, . . . , Jr- 

Combination of 7? rank-/,. CP approximations for components bf^~^^ yields a rank-7 Kruskal tensor 
(J = Jr) as mentioned in the previous section (see Lemma I4.1I ). A rank-7? CPD of ^7 will give 
us an approximate to the true solution. 

An alternative approach is that we consider M-mode unfolding as (M - 1) two mode unfoldings. 
For example, since (1,2,3) = (1,(2,3)), the factor matrices are then sequentially estimated using the 
method in Section IIV-AI Indeed, this sequential method is recommended because it is based on SVD 
and especially low-rank approximation to matrix is well-defined. 

C. The proposed Algorithm 

When the tensor ^ is unfolded by a complex unfolding rule / which comprises multiple two-modes or 
M-mode unfoldings such as / = [(1,2), (3, 4, 5), (7, 8)], construction of a rank-7 structured Kruskal tensor 
becomes complicated. In such case, the factor reconstruction process in section IIV-AI or section IIV-BI 
is sequentially applied to mode to be unfolded. In Algorithm [2j we present a simple version of FCP 
using low-rank approximation to merge components. The algorithm reduces the tensor order from 
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Algorithm 2: FCP 



Input: Data tensor ^ : (/i x /2 x • • • x Iff), rank R, threshold t(> 0.98) 

Unfolding rule I = [liJi, ■ ■ ■ , ImI L = UmW, lm(2), LiKm)] 
Output: AeR^, N matrices A*") e R^"^^ 
begin 

% Stage 1 : Tensor unfolding and compression 

|[S,U^'', ■ ■ . ,U**^']| = TDCy|[;]|,min(I,/?)) % Tucker decomposition of order-M 
% Stage 2: CPD of the unfolded tensor 

|[/l;B*'\ . . . ,B*'^']| = CPD(g,7?) % order-M CPD of the core tensor 

for m = 1, 2, . . . , M do A*'") ^ U<"') B^'"' % Back projection of TD 
% Stage 3: Sequential estimation of factor matrices 

M = M 

for m = M, M - 1, . . . , 1 do 
if K„j > 2 then 

for k = \,2,...,K,n do 

% Stage 3a: Construction of rank-y Kruskal tensor 

m = m + k-\,~A=U, A<™> = [], A<™) = [], M = [] 
for r - 1,2,..., 7? do 

F, = reshape(af \ O^Li /;„(,)]) 

w Ur diag(cr,.) % truncated SVD such that ||a-,.||^ > r 

A ^ [i,/l.o-,], A(^> ^ [A("'\U,], A(^+i) ^ [A(^+'\ V,] 

M ^ blkdiag(M,lixy,) 

% Stage 3b: Rank-y to rank-i? Kruskal tensor 

= P;A(i),...,A('«-i\A<'"),A(™+i),A('"+i>,...,A(^)l 
if Zf=i Jr>R then 

= P; A^i^M, . . . , A(™-i^M, A(^\ A<^+i\ A(™+i^M, . . . , A^^^MJ 
E/l;A(i>,A(2),...,A('^']| = structuredCPD(yy,7?,ys) 

M- M+ 1, 



% Stage 4: Refinement if needed 

U; A<i>, . . . , AWj - CPD(y , /?, [i; A'-^\A'-^\ A^]!) 



TD(y,/?): rank-/? Tucker decomposition of order-A^ tensor y where R = [/?i,7?2, ■ ■ ■ ,Rn]- 

y = CPD (y, 7?, y,„„): approximates an order-A^ tensor or a tensor in the Kruskal form y by a rank-7? 

Kruskal tensor y using initial values y,,,,-,. 



to M (e.g., 3) specified by the unfolding rule I = [Zi, /2, . . . , /m], where each group of modes /,„ - 
[/„,(1), h„{2), . . . , l,n{K,„)l K,„ > 1 and I^f^^ K,„ = N. 

Tucker compression can be applied to the unfolded tensor H6l . The factor matrices are sequen- 
tially recovered over unfolded modes m, i.e., modes have > 2. The sequential reconstruction of 
factors is also executed over (K^ - 1) runs. Each run constructs an order-M rank-7 Kruskal tensor, and 
approximates it by an order-M rank-7? Kruskal tensor using to initialize. It also indicates that ^ has 
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better approximation error than that of "y^. The tensor order M gradually increases from M to A^. The 
full implementation of FCP provided at http://www.bsp.brairi.rikeri.jp/~phari/terisorbox.php 
includes other multimode unfoldings. 

Although a rank-7? CPD of unfolded tensor has lower approximation error than the best rank-7? CPD 
of the original tensor, for difficult data with coUinear components or under-rank approximation (R is 
much lower than the true rank), CPDs of the unfolded tensors and structured Kruskal tensors are often 
proceeded with a slightly higher rank R. 

For some cases, a refinement stage may be needed to obtain the best solution. That is the approximation 
solution after low-rank approximations is used to initialize CPD of the raw data. This stage often requires 
lower number of iterations than CPD with random or SVD-based initializations. 

V. Simulations 

Throughout the simulations, the ALS algorithm factorized data tensors in 1000 iterations and stopped 
when e < 10~^. The FCP algorithm itself is understood as Algorithm |2] with low -rank approximation. 
Otherwise, the FCP algorithm with rank-one approximation is denoted by RIFCP ALS was also utilized 
in FCP to decompose unfolded tensors. 

Example 4 (Decomposition of order-6 tensors.) We analyzed the mean SAEs (MSAE) of algorithms 
in decomposition of order-6 tensors with size I„ = R = 20 by varying the number of low coUinear factors 
from 1 to 6 with [ci < C2 < • • • < ce]. Tensors were corrupted with additive Gaussian noise of dB SNR. 
ALS was not efficient and often got stuck in local minima. MSAEs over all the estimated components 
by ALS were clearly lower than CRIB, especially when there were 5 coUinear factors (the first test case 
in Fig |4(a)| ). The FCP method was executed with "good unfolding" rules suggested by the strategy in 
Section IIII-BI and "bad" ones which violated the unfolding strategy listed in Table Jll 

Performance of RIFCP (Algorithm [U was strongly aff^ected by the unfolding rules. Its SAE loss 
was up to 21dB with "bad unfoldings". For all the test cases, FCP with low-rank approximations (i.e.. 
Algorithm |2]) obtained high performance even with "bad unfolding" rules. In addition, FCP was much 
faster than ALS. FCP factorized order-6 tensors in 10-20 seconds while ALS completed the similar tasks 
in 500-800 seconds. Finally, in this simulation, FCP was 47 times faster on average than ALS. 

Example 5 (Factorization of Event-Related EEG Time-Frequency Representation.) This example 
illustrates an application of CPD for analysis of real- world EEG data lH, ||54l which consisted of 28 inter- 
trial phase coherence (ITPC) measurements 1,55,1 of EEG signals of 14 subjects during a proprioceptive pull 



November 19, 2012 



DRAFT 



18 



TABLE 11 

Comparison of MSAEs (in dB) of ALS and FCP with CRIB in decomposition of order-6 rank-20 tensors of size /„ = 20, for all 
n. Correlation coefficients of factors c„ are 0.1 or 0.9 and [ci < Ct < ■ ■ ■ < cg]. The performance was measured by varying the 
number of c„ = 0.1. RIFCP (J,- = 1, for all r) was sensitive to unfolding rules. 

"Good unfolding" "Bad unfolding" 



No. 0.1 


CRIB 


ALS 


Unfolding 


Alg.E 




Unfolding 


Alg.E 


Aig.m 


1 


42.33 


10.8 


[1,(2, 3, 4), (5, 6)] 


40.49 


41.95 


[2,3,(1,4,5,6)] 


31.33 


40.78 


2 


49.40 


44.3 


[1,2, (3,4,5,6)] 


48.64 


48.65 


[3,4,(1,2,5,6)] 


41.06 


47.50 


3 


52.16 


48.0 


[1,2, (3,4,5,6)] 


52.06 


52.07 


[4,5,(1,2,3,6)] 


41.94 


50.89 


4 


52.26 


46.8 


[1,2, (3,4,5,6)] 


52.23 


52.23 


[1,(2, 3), (4, 5, 6)] 


51.02 


51.31 


5 


52.26 


36.2 


[1,2,(3,4,5,6)] 


48.78 


51.24 


[(1,2), (3,4), (5, 6)] 


30.54 


51.44 


6 


52.26 


30.5 


[1,2,(3,4,5,6)] 


51.13 


52.19 


[(1,2), (3, 4), (5, 6)] 


31.04 


47.75 




ALS 

rank-1 , good folding 
rank-1 , bad folding 
low rank, good folding 
low rank, bad folding 
CRIB 



■jyiuy LJE 



3 

No. 



4 

= 0.1 



nn nf 



1000 



100 



10 



I I 



^1 rank-1 , good folding 
I rank-1, bad folding 
I low rank, good folding 

^|low rank, bad folding 



3 4 
No. c = 0.1 



(a) SAE loss. 



(b) Execution time. 



Fig. 4. Illustration of MSAE loss and execution time averaged over 30 MC runs in decomposition of order-6 rank-20 tensors 
of size I = R = 20 corrupted with additive Gaussian noise of dB SNR, and c„ 6 {0.1,0.95). 



of the left and right hands. The whole ITPC data set was organized as a 4-way tensor of 28 measurements 
X 61 frequency bins x 64 channels x 72 time frames. The first 14 measurements were associated to a 
group of the left hand stimuli, while the other ones were with the right hand stimuli. The order-4 ITPC 
tensor can be fitted by a multiUnear CP model. ]Vl0rup et al. analyzed the dataset by nonnegative CP of 
three components and Tucker components and compared them with components extracted by NIVIF and 

ICA 121. 

In this example, our aim was to compare the factorization time of ALS and FCP over various R in 
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TABLE III 

Comparison of fit (%) values in factorization of the ITPC tensor by ALS and FCP in Example[5] Rank-1 FCP (i.e., Algorithm [TJ 

COMPLETELY FAILED IN THIS EXAMPLE. StRIKETHROUGH VALUES MEAN THAT THE ALGORITHM DID NOT CONVERGE TO THE DESIRED SOLUTION. 



R 


ALS 


Rank-1 FCP 




Low-rank FCP 


Tucker — > ALS 


Tucker Alg.[2] 






[(1,2), 3, 4] 


[1,2, (3, 4)] 








5 


37.7 ± 0.17 


32.7 




36.0 


37.7 + 0.17 


36.2 +0.03 


36.9 ±0.00 


8 


43.8 ± 0.13 


26.7 




42.1 


43.8 ± 0.00 


42.7 ±0.01 


42.9 ±0.58 


11 


48.0 ± 0.04 


19.5 




32.5 


47.9 ± 0.16 


47.1 ±0.09 


47.5 ±0.08 


20 


56.0 ± 0.12 






-4ft* 


55.9 ± 0.16 


55.5 ±0.18 


55.8 ±0.13 


30 


61.1 ± 0.09 






504^ 


61.1 + 0.09 


60.8 ±0.07 


60.9 ±0.13 


40 


64.5 ± 0.08 


-449- 






64.4 ± 0.14 


64.2 ±0.08 


64.3 ±0.18 


60 


68.7 ± 0.02 


-4095- 






68.1 + 0.05 


68.6 ±0.09 


68.6 ±0.10 


72 


70.4 ± 0.02 








69.9 + 0.08 














TABLE IV 






Performance of rank-1 FCP with different unfolding rules in decomposition of the ITPC tensor in Example |5] Strikethrough 






VALUES MEAN THAT THE ALGORITHM DID NOT CONVERGE TO THE DESIRED SOLUTION. 












Rank-1 FCP 


ALS 








R Unfolding 


Fit 


Time 


CoUinearity degree 


Fit Time 








rules 


(%) 


(sees) 


[C'l, C'2, C3, C4] 


(%) (sees) 








[(1, 2), 3, 4] 


26.7 


2.15 


[0.48, 0.70, 0.54, 0.89] 










[1, (2, 3), 4] 

8 

[1, 2, (3, 4)] 


29.9 
42.1 


5.72 
5.59 


[0.37, 0.50, 0.40, 0.83] 
[0.34, 0.50, 0.49, 0.72] 


43.8 242 








[1, 3, (2, 4)] 


41.3 


4.67 


[0.49, 0.52, 0.50, 0.75] 










[(1, 2), 3, 4] 


-64t4 


5.79 


[0.43, 0.62, 0.50, 0.89] 










[1, (2, 3), 4] 

20 

[1, 2, (3, 4)] 


28.3 
15.4 


12.12 
10.39 


[0.34, 0.45, 0.56, 0.90] 
[0.33, 0.48, 0.46, 0.86] 


56.0 752 








[1,3, (2, 4)] 


-33t6 


9.17 


[0.39, 0.57, 0.69, 0.92] 







the range of [5, 72] with and without a Tucker compression prior to the CP decompositions. The FCP 
method employed ALS to factorize the order-3 unfolded tensor, and the fast ALS for the structured 
Kruskal tensors (see in Appendix lAl). Interpretation of the results can be found in ||9l, |[54l . The low-rank 
FCP algorithm was applied with the unfolding rule / = [1,2,(3,4)]. 

Execution time for each algorithm was averaged over 10 IVIonte Carlo runs with different initial values 



and illustrated in Fig. |5(a)| for various R. For relatively low rank R, a prior Tucker compression sped 
up ALS, and made it more efficient than FCP when 7? < 11. The reason is explained by compression 
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(a) Execution time (seconds) (b) Execution time as ^ = 20 

Fig. 5. Illustration of execution times (seconds) of ALS and FCP for factorization of order-4 ITPC tensor in Example |5] 
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Fig. 6. Illustration of leading singular values of matrices F, (r = 1,2, ...,R) reshaped from components estimated from the 
ITPC tensor with different unfolding rules [(1,2), 3, 4] and [1, 2, (3, 4)]. Rank-l FCP (i.e., Algorithm[TJ failed in this experiment 
because this algorithm worked only if all F, were rank-one matrices. 



time for unfolding tensor in FCP. However, this acceleration technique was less efficient as R —> I,, and 
inapplicable to ALS for R > I„. FCP significantly reduced the execution time of ALS by a factor of 5-60 
times, and was slightly improved by the prior compression. Comparison of fits explained by algorithms 
in Table In] indicates that while FCP with Algorithm |2] quickly factorized the data, it still maintained fit 
equivalent to ALS. 
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For this data, the rank-one FCP algorithm (i.e., Algorithm [T|), unfortunately, did not work well. Fits of 
this algorithm are given in Table Hill Performance of this algorithm with several unfolding rules including 
[(1,2), 3, 4], [1,(2, 3), 4], [1,2,(3,4)] and [1,3,(2,4)] is compared in Table |IVl When 7? = 8 and using 
the rule / = [(1,2), 3,4], the rank-1 FCP algorithm showed the worst performance with a fit of 26.7% 
which was not competitive to a fit of 43.8% obtained by ALS. The average collinearity degrees of the 
estimated components c„ = [0.48,0.70,0.54,0.89] indicates that we should not fold modes 1 and 2; in 
addition, folding modes 2 and 4 which had the largest coUinear degrees is suggested, i.e., the unfolding 
rule / = [1,3,(2,4)]. It is clear to see that the unfolding rule I - [1,3,(2,4)] significantly improved 
performance of the rank-1 FCP algorithm with a fit of 41.3%. Moreover, the unfolding rule I - [1, 3, (2, 4)] 
was also suggested according to the average coUinear degrees c„ = [0.37,0.50,0.40,0.83] achieved when 
applying the unfolding rule I = [1,(2, 3), 4]. This confirms the applicability of the suggested unfolding 
strategy. For this test case, the unfolding rule I = [1,2,(3,4)] allowed to achieve the best fit of 42.1%, 
although this rule was not suggested by the strategy. This can be understood due to the fact that the 
average coUinear degrees of modes 2 and 3 were very similar (0.50 and 0.49, or 0.52 and 0.50, see in 
Table lYll. 

For higher ranks, e.g., R > II, FCP with rank-one approximation completely failed. The unfolding 
strategy did not help anymore (see fit values in Table HID) . In Fig. [6l we display leading singular values 
of reshaped matrices F,. (r = 1,2, . . . ,7?) from the estimated components for 7? = 8 and 20. The results 
indicate that were not rank-one matrices, especially the matrices received when using the rule / = 
[(1,2), 3,4]. Note that the rank-one FCP algorithm works if and only if all F^ are rank-one. This also 
confirms that the low-rank FCP algorithm was appropriate for this data. 

Fig. |5(b)| illustrates the relative approximation errors s = of ALS and FCP for 7? = 20 as 

functions of the execution time. ALS took 536.5 seconds to converge. FCP took 1.2 seconds for com- 
pression, 0.9 seconds for CPD of the order-3 unfolded tensor, 2.73 seconds for low-rank approximations, 
2.1 seconds for the refinement stage. ALS and FCP converged to the relative approximation errors eals 
^ 0.4417, while spcp ^ 0.4399, respectively. 

Example 6 (Decomposition of Gabor tensor of the ORL face database.) This example illustrated 
classification of the ORL face database ll56l consisting of 400 faces for 40 subjects. We constructed 
Gabor feature tensors for 8 different orientations at 4 scales which were then down-sampled to 16 x 16 
X 8 X 4 X 400 dimensional tensor The unfolding / = [1,2,(3,4,5)] was applied to unfold y to be an 
order-3 tensor. ALS Il23l factorized both ^ and }Sm into 7? = 30,40, 60 rank-1 tensors in 1000 iterations. 
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TABLE V 

Comparison between ALS and FCP (Alg.|2J in factorization of order-5 Gabor tensor constructed from the ORL face dataset. 



R 


Algorithm 


Fit (%) 


Time 
(seconds) 


Ratio ^ 


ACC (%) 


NMI (%) 


30 


FCP 
ALS 


60.59 
60.56 


24 
927 


39 


85.00 
85.25 


92.91 
93.22 


40 


FCP 
ALS 


62.46 
62.63 


39 
1599 


41 


84.25 
85.50 


92.57 
93.68 


60 


FCP 
ALS 


65.47 
65.64 


105 
16962 


162 


83.00 
81.38 


91.62 
91.44 



and stopped when ||e - £o/j|| < 10~^e where e = \\y The rank-one FCP algorithm did not work for 

this data. For example, when R = 10 and applying the rule I = [1,2, (3,4, 5)], RIFCP explained the data 
with a fit of -31.2%, and yielded average coUinearity degrees of [0.60,0.66,0.64,0.95,0.64]. Although 
a further decomposition with the unfolding rule I = [1, (2, 3), (4, 5)] achieved a better fit of 44.8%, this 
result was much lower than a fit of 54.5% obtained by ALS and FCP. 

The factor A^^' e g^400xfi comprised compressed features which were used to cluster faces using the 
K-means algorithm. Table |V] compares performance of two algorithms including execution time, fit, 
accuracy (ACC %) and normalized mutual information (NMI). For R = 40, ALS factorized y in 1599 
seconds while FCP completed the task in only 39 seconds with a slightly reduction of fit (« 0.17%). For 
R = 60, ALS was extremely time consuming, required 16962 seconds while FCP only took 105 seconds. 
Regarding the clustering accuracy, features extracted by FCP still achieved comparable performance as 
those obtained by ALS. 

VL Conclusions 

The fast algorithm has been proposed for high order and relatively large scale CPD. The method 
decomposes the unfolded tensor in lower dimension which is often of size R x R x R instead of the 
original data tensor. Higher order structured Kruskal tensors are then generated, and approximated by 
rank-/? tensors in the Kruskal form using the fast algorithms for structured CPD. Efficiency of the strategy 
proposed for tensor unfoldings has been proven for real-world data. In addition, one important conclusion 
drawn from our study is that factor matrices in orthogonally constrained CPD for high order tensor can 
be estimated through decomposition of order-3 unfolded tensor without any loss of accuracy. Finally, the 
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proposed FCP algorithm has been shown 40-160 times faster than ALS for decomposition of order-5 and 
-6 tensors. 



Appendix A 
Algorithms for structured CPD 

A. Low complexity CP gradients 

CP gradients of y in Lemma (14.11 ) with respect to A*"^ is quickly computed without construction of y 
as follows 



(Y(„) - Y(„)) (O A^'^'j = A(") diagCl) W„ - A<")r„ 



(24) 



where W„ = (©^^^(A^")^ A^"')) and r„ = (®^^„(A(*)^ A^)). By replacing A^"' = B(")M, and employing 
(m^a) ® (m^b) = M ^ (A ® B), M ((m^a) ® b) = a ® (MB), the first term in dlDl is further expressed 
for « = l,...,A^-2 by 



A<"^ diag(^)W„ =B' 



(N-2 



k=\ 



)(B«^A«) 



K 



K 



and 



iil^((U[A(^-i))®(V^AW)) 
i«lJ((ujA(^-^))®(v,^A(^))) 
/liVfAW diag(6Ji) 



A<^-^> diag(l)W. 



N-l 



:{N-i) 



A('^Uiag(^)WA, = A 



(N) 



^sV^AW diag(6;«) 
/liU[A(^-i' diag(6Ji) 

^«UjA(^-i) diag(6J«) 



where Q = [o),] = ®k=i (A®^ B®). For each n = 1,2, ... ,A^, such computation has a low computational 



complexity of order O 



r f N-2 \ 

2 



N + 



+ JR{In-i+In) 



n=l J 

worst case, Jr « /, then J » RI, we have 



. For tensors which have /„ = /, for all n, in the 



O 



R' 



N-2 \ ^ 
+ JR{In-i+In) 



n=\ J 



= 0(NR^f) ^0(rI^), 



for N >4. 
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B. Fast algorithms for structured CPD 

By employing the fast CP gradient in previous section, most CP algorithms can be rewritten to estimate 
A^"' from the structured tensors ^ in Lemma (14.1b . For example, the ALS algorithm is given by 

-1 



fN-2 

®(B(">^A(">) 



K 



n = l,...,N -2. 



Appendix B 

Cramer-Rao induced bound for angular error 
Denote by ai the mutual angle between the true factor a^'' and its estimate a*^^ 

ai = acos— -|- J— , (25) 

Theorem B.l: ||37]| The Cramer-Rao induced bound (CRIB) on aj for rank-2 tensor is given by 

9 /■ T 1 /-I ^2^^,2 „2 , , 7,2 



CRIB(ai) = ^ 



f/l-1 {\-c])h] y2+,_h]z{Z+\) 

+ — — 7^ 7T-, — — tt: :t (26) 



l-/if (1 -cij-/i^(z+ l))2 + /i^Cv + ciz)2 



where c„ = ct'^^ d!^ , and 



/in - ]~| for n - 1,...,A^, (27) 



l<k+n 



JL 7.2/, ^2 



^ /I„^(l-C^) 
„=2 '-n "n'-l 
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